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SUMMARY 

The feasibility and utility of controlling the Space Shuttle TV cameras 
and monitors by voice has been investigated. The voice ctmtrol application 
concept is related to task scenarios where the operator uses both hands to 
control the 50-foot (16-meter) manipulator of the Space Shuttle. The use of 
computer-recognized voice commands allows the operator to effectively press 
the control buttons of the Shuttle TV cameras and monitors by voice while lie 
manually controls the Shuttle manipulator. The pilot voice control system 
developed at the Jet Propulsion Laboratory (JPL) to test and evaluate the 
feasibility of controlling the Shuttle TV cameras and monitors by voice com- 
mands utilizes a commerc Icily available discrete word speech recognizer which 
can be Lrained to the individual utterances of each operator. Successful 
ground tests have been conducted with this pilot application system at the 
Johnson Space Center (JSC) Manipulator Development Facility (MDF) using a sim- 
ulated full-scale Space Shuttle manipulator. The test configuration involved 
the berthing, maneuvering and deploying a simulated science payload in the 
Shuttle bay. The handling task typically required 15 to 20 minutes and 60 to 
80 commands to 4 TV cameras and 2 TV monitors. The best test runs have shown 
96 to 100% voice recognition accuracy. The main conclusions of the tests arc: 
(i) the application concept offers potential for enhanc'em(*nt of Shuttle opera- 
tions; (ii) additional development is needed to achieve operational accuracy 
and reliability over a broad user population; (iii) the use computer- 
recognized voice com.mands can contribute to a better man-machine system inter- 
ai'tion; (iv) human acoustic charac ter ist ics and training have a major impact 
mi system performance. As a conclusion it was decided to conduct further 
appl i(‘at Lon tests and to promote the de velopment of a prototype flight voie'e 
cc^mmand system for future Space Shuttle app 1 i c*a t ions . 


1 . INTRODUCTION 

Tffic lent on-line decision making for manipulator c'ontrol requires that 
the oper.itov h.ive an easy access to the relevant information sources. This 
is partimihirly important when the task requires frequenl changes in the 
setting of a video systc»m which contains sevc*ral TV cameras and monitors in 
order to obtain the necessary information for manipulate?' control. In a fullv 
control mode, where both tht‘ manipulatc^r and video system arc manually 
<oiurollt*a, Mu‘ operator can often attend either the video system c'ontiol 
p.mc 1 or the* manipultUor hand ca-ntrol lers. lie c'annc/t do botli aL one time*. 

I’ll is is equivalc*nt to strictly st*quential lu'md control of the m.inipulator 
and V ideo svst c‘m. 
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Altogether seven TV camera mounting locations exist in the cargo bay and 
on the manipulator of the Space Shuttle. Two TV monitors, located in the 
Shuttle cockpit, can be used in a split screen mode. Hence, up to four scenes 
can be displayed at one time. From the TV control panel in the Shuttle cock- 
pit any camera can be linked with any monitor, and the pan, tilt, focus, iris, 
zoom and some internal electronic parameters of the cameras can be controlled. 
The control panel contains altogether thirty three pushbuttons and switches. 
(Figure 1,) 

The RMS (Remote Manipulator System) operator normally uses both hands to 
control the motion of the Shuttle manipulator as shown in Fig. 2. The left 
hand controls the three translational motions, the right hand controls the 
three orientation motions of the manipulator. The video system control key- 
board is under the left arm of the operator. (A few keyboard switches and 
pushbuttons are visible in Fig. 2.) 

When simultaneous manual operation of the RMS and video system is imprac- 
tical, the manual control of the Shuttle video system requires the execution 
of a complex multi-step process: 

a. Decide which TV camera and monitor should be changed and how. 

b. Stop manipulator motion, set RMS brakes on, and take hands off the 
manipulator hand controllers. 

c. Turn visual attention to the video system control keyboard. 

d. Find the appropriate buttons and switches on the keyboard. 

e. Activate the appropriate buttons and switches and verify the success 
of this action on the keyboard. 

f. Turn visual attention back to the TV monitors. 

g. Verify the success of :he desired information change on the monitors; 
if not satisfied repeat the process from step c. If everything is 
all right, proceed with step h. 

h. Release the brakes, put hands back to the maiiipulator hand controllers, 
and continue the control task. 

Tfiis process causes a disruption of RMS motion, diverts the operator’s 
visu<il attention and manual work, and distracts his mental concen t ra t ion from 
the manipulator control tasks. All these can contribute to lengthening the 
whole operation and to Increasing operator workload. 

The complex process of manual contr of the Shuttle video system during 
manipulator operations can be considerably simplified by using a computer- 
h.isi'd discrete word voice command system for controlling the TV cameras and 
moniua's. Since, in effect the buttons are "pushed by voice" and the switches 
.irt‘ "turned on/off by V(»ico", the entire video system control process is 
Ti'diu c‘d to t lie following simple steps: 

a. Decide whh'h TV camera and monitor should b(‘ changed a ad how. 

h. Say the appropriate word(s). 

c. Verify the success of tlie desired inform.' i ion (diange on the monitors, 
and prt)ceed with the manipulator control task if everything *s .*11 
right, ctherwise repeat step b. 
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It can be hypothesised that voic-e control of tl)e TV cameras and monitors does 
not disturb the operator’s visual attention and manual control work, and mini- 
mizes mental distraction from the control task. Consequently, the potential 
of voice control for enhancing the Shuttle RMS operation was investigated in 
tins experimental study. 

A pilot voice control system was developed at JPL to test and evaluate 
the feasibility and utility of controlling the Space Shuttle video s stem by 
computer-recogni zed voic*e commands during manual control of the Shuttle mani- 
pulator. The voice control system is briefly described in Section II. 
Alternative control vocabularies are presented in Section III. Control tests 
condiictt'd at tiie dSC MDF using the simulated full-scale Spac'c* Shuttle manipu- 
lator are di^scribed in Scntion IV. The test results and ('ont' I usions are sum- 
marized in Section V. 


II. VOICK CONTROL SYSTEM DESCRIPTION 

The pilot voice control system developed at TPL to demonstrate and eval- 
uate Spai*e Shuttle applic'ation concepts utilizes VDETS, a commercially avail- 
able discrete word speech r^H'.ogn izer . VDETS is essentially a trainable acous- 
th' pattern classifier that produces a digital code as an output in response 
to an Input utterance. VDETS is implemented in a Nova 2 minicomputer. 

The basic software used in c'onjunction with VDETS includes a LINC Tape 
Operating System (LTOS) and the VOICE Executive. LTO^ allows one to edit 
{programs and save them on a LINC tape, to store voice reference templets on 
a LINC tape, and to execute Nova machine language programs. The VOICE Execu- 
tive is a Ntwa nuuhine language core-image program that assembles user VOICE 
programs into Nova machine rode with embedded calls to tlie VOICE Executive, 
fhe VOICK programming language allows one to define and develop application 
V(H'abu 1 ar ii's and syntaxes vind to perform training and recognition. The VOICE 
Executive* is i-ompletely interrupt driven to aci'ommodate real time response to 
external t'Vents. 

riu* VO in* i“ontri>l system must be trained to i*ach individual operator 
Vs'host* voici* pattern Le*inpli‘ts are then steered on LINC tape for recall bet(.>re* 
using t lu* system in t lie recognition mode. Tr<iining typit'ally consists of 
repi*ating t lu* vocahularv words st“t seven times as it is displaye*d on the self- 
sc.in display unit. I’he operator wears a headset with a noise cancelling 
micropluMU* and adjusts the volume (’ontrol to a^-t'ornmoda t e his normal speaking 
voitL*. In tile reiognitiun modt*, the self-si'an display shows the word ret'og- 
ni/etl hv tile system in respiinse to t lu* op(*rator*s utti*rance. 

The voii'e command system was connecti d to the i’V i amera and monitor con- 
lii'l I iiu’uits through a programmable interfat'e for which a Motorola b802 
m i e ri)proo*ssor w.is employed, Whenc*ver an opi*rator said a c ommand word, the 
pr I rimmed VDETS would si'iid .in ASl'II cotle to t lu* inti'rt ace. The inter! iii'e 
m i c r-proci‘ssor would t lu*n send the data out over a p.iralU'l line to a hard- 
w.ue tleioder wlihli energlzt*d iUie of the 52 wires (onnected to the video sys- 
tem \iuUrol elreuits. i'he bH()2 m l(' roproci*ssor also performi*tl some simple 
timing .nul W>git ImutiiUis. For example, somt* ot t lu* switches are momentary 
<onlact swilelu*^, while t lu* can.era movement toggle .,wlt*hes must he lu*ld in 
the ”on” slate unti a *’stop" eoimnand is he.ird. 


U 
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'fhe video systein voice control was implemented so that the commands 
voiced by the operator did not require verification before execution; they 
were executed immediately. The effect of a misrecognized command was immedi- 
ately visible on the monitor. The operator needed only to voice new commands 
to correct for misrecognition. 

The voice control system ran in parallel with the manual control keyboard 
so that, if required, the operator could always revert to the manual control 
of the video system. The main elements of the voice control system together 
with the overall system implementation are shown in Figs. 3-4. Performance 
was recorded on a printer. 


III. ALTERNATIVE CONTROL VOCABULARIES 

Several different combinations of vocabulary words both with and without 
syntax restrictions were developed and tested. Figure 5 shows a vocabulary 
and syntax which closely follow the words and organization of the keyboard. 

As seen in Fig. 5, the actual TV camera and monitor control words are arranged 
in five groups corresponding to the grouping of buttons and switches of the 
keyboard shown in Fig. I. 

In general, the syntactic organization of command words serves the pur- 
pose of increasing w^,d recognition accuracy. The syntactic organization 
limits the number of words to a subset of the total vocabulary that the 
speech recognition system has to look up for identification of a spoken com- 
mand word. Figure 6 shows a vocabulary with a multilevel syntax. As seen 
in Fig. 6, one can construct many subsets of the vocabulary which only con- 
tain two, three or five words. But increased syntactic grouping of words 
increases the application rules that the operator must remember and follow. 
Note also in Fig. 6 that some of the subset words are very short, e.g., **far*\ 
”in'\ etc. Very short words have higher misrecognition probability 

than the longer words. The words in Fig. 6 are "natural'* in the sense that 
they closely follow the names or functions of the keyboard buttons and 
switches. 


The training experiments have shown that the operators prefer simple 
voi abiilar ies with minimum or no syntactic restrictions. Following this desire, 
two vocabularies were constructed shown in Fig. 7 and d. Note that many 
vi>i\ibularv words shown in Fig, 7 and 8 are concatenated w<^rds, e.g., "zocm-in", 
"tilt-up", "focus-far", etc. The use of concatenated words increased recog- 
nition accuracy by b to 8% and provided smoother and faster operation per- 
torinaiui'. The use of a concatenated word requires only one voice command 
(i.g., in") for an action Instead of two words (e.g., "zoom" and "in"). 

But somi‘ of the words shown in Fig, 7 and 8 are rather lengthy. In some 
cases it was necessary for the operators to speak at an unnaturally fast 
speech rate to get the entire utterance within the 1.5 second window that the 
speech recognition system allows for each spoken word. If the utterance 
lasts longer than 1.5 seconds, the recognition accuracy can be poor. 

rhe vocabulary which was used during the tests at the vISC MDF is the 
simplest one without syntax shown in Fig. 8, It only contains two words 
( stx>p" t)r "reverse**) whirls logically must follow the action commands like 
" i r is-open** , "pcin-r Ight ", et c . 
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IV. CONTROL TESTS 


Control tests were condu;*ted al the JSC MDF in January \^6l to evaluate 
the feasibility and utility of controlling the Shuttle TV cameras and monitors 
by voice during manual control of the Shuttle manipulator. The task configu- 
ration chosen for the tests was that of handling a Plasma Diagnostic Pack- 
age (PDP) payload mock-up by the manipulator in the Shuttle bay. The POP 
was berthed to and deployed from a retention mechanism. 

The task started with the manipulator holding the PDP payload mock-up 
above the aft cargo bay area (Fig, 9), It was then docked to the retention 
mechanism in the aft bay, the time recorded, then deployed from there, nu>ved 
and docked to a similar retention mechanism in the forward cargo bay area. 

The task ended when the payload was removed back to a starting position above 
the cargo bay. The windows of the cockpit were blocked so that the operators 
were foiced to rely upon the TV cameras and monitv^rs for visual feedback from 
the task area. The manipulation task typically required 15 to 20 minutes. 

Altogether 48 test runs were pei formed by four operators, 32 runs with 
voice control of the video system. Table 1 summarizes the average number of 
video system control commands in both manual and voice control modes. The 
average number of voice (*ommands In Table 1 does not include the misrecogn ized 
<ommand words. Table I shows that the average number of manual and voice com- 
mands varies from operator to operator. It is Interesting to note that the 
.iverage command number variation between optrators in the voice mode is less 
than in the manual mode. 

Tl'.o control tests were performed after six, seven, eight, and nine train- 
ing passes lor each operator. Where all the training was done at approximate! 
tin* same time, six training parses seemed to give the best results. Any more 
than tins seemed to corrupt the training patterns. The training was performed 
by repe.it ing the whole voiabulaiy sequentially rather than repeating each wor^ 
i nil Iv idiial 1 y . When tht^ tests were performed on a subsequent day from the 
training, twt^ extra update training passes seemed to give the best results. 

Tlu* standard proi*» dure was to save seven primary training passes on the LINE 
(apt tor t*ach »q)eralor, and tluai update these sevini passes just bidore the 
svsti^m was used in the rt‘tt)gn i t i on nu>de during the tontrol tests, disregarding 
(he prior updates. 

During tin* tests tlie typital mode o( operai Ion was tt) first position tlu» 
v'amiuM>. and then concentrate on pavioad docking. This was true evt*n with tlie 
svstt‘m, 1 t ht>npdi Cir I lu‘ luid i>f (he tests th.ree operators were able to 
lOtnlMiu* a tariain .imount of camera mt>vement -»?ith pavioad movement as Ihev 
bet <ine mort‘ ct^mf c^r t ab I e with t lu‘ systt*m. 

I he voii't* ta'rma.id systt*m was used lUU only to seltu i the various cameras 
and nuMiitcu s^ isC. a si> to coni i I'l the camera movement and lens parameters 
(pan, tilt, ten ns, iris, /AU)m) . Tlie most troublesome part of the test was to 
coiUi cW c amera movemiait . Ueii' tlu* accuracy was most important since timing 
is critical in ordi*r to sti'|> tlu* nu>venu*nt at tlu* right time to aihievo the 
desired results. In most cases the c>pt*rators preferred to control cami*ra 
i*u>vemeiu in '* W>w-rat i*‘* sitting. This was .ilsi) the preferred st..ling In man- 
na 1 ic'utrc>l modi . igh-rat I '■ slotting w.is typically ura*d for ioarsi* movement 

iUMil ri' I . 



V. RESULTS AND CONCLUSIONS 


The best individual test runs have shown a recognition accuracy from 
96% to 100% As seen in Table 2, there is relatively large recognition 
accuracy variation between the individual operators. Three of the four opera- 
tors underwent familiarization training with the voice command system at JPL 
two months prior to the tests at JSC, Their recognition accuracy during the 
tests at JSP was consistently better than the recognition accuracy the 
fourth operator who learned the use of the voice command system c clay 
before the tests. 

The two "accuracy" columns in Table 2 refer to two methods of computing 
recognition accuracy. In the first column the accuracy is computed without 
the rejected words. In the second column the accuracy is commuted by taking 
account of the rejected words. That is, rejected words were counted as errors. 
Each percent number belonging to an operator in the columns of Table 2 is the 
result from four individual test runs. 

Table 2 indicates that voice recognition accuracy also depends on the 
vocabulary to some extent. The vocabularies JSCN04 and JSC002 in Table 2 
correspond to the vocabularies shown in Figs. 7 and 8, respectively. But, as 
seen in Table 2, the recognition accuracy of the best scoring operator (opera- 
tor B) was insensitive to both vocabulary variation and accuracy computing 
method . 

Several off-line recognition tests (without manipulator control) were 
also performed at JSC with four naive users who had never previously used a 
voice recognition system. Their average recognition accuracy was about 90%, 

It is interesting to note that among the four primary operators and four 
naive users there were altogether three female and five male subjects, and the 
average recognition accuracy of the female subjects was 8-^^% higher than the 
average recognition accuracy of the male subjects. It is also noted that only 
one female was used for the on-line tests, and her recognition scores were 
nearly perfect, ranging from 96% to 100%. Of course, these data don't have sta- 
tistical significance since the test subject population was too small. 

The duration of each test run with the voice command system steadily 
decreased as each operator became more familiar with the system. The average 
lime per task in voice control modes was still about 10% longer than in man- 
ual control mode during the tests which should be regarded as introduc y . 

It is felt that this time duTtitlon average will be i .versed whcTe (i) the 
operators gain more experience with the voice command system and (11) the 
aracy of rlK' voire recognition system is Improved. It should be kept in 
mind t)iat all operators had several years extensive experience with the manual 
operat icut of the video system. As seen In Table 3, however, there w.as a large 
variation between the average time performance cf the four operators even dur- 
ing the manual operation of the video system. 

At ter bei omlng more familiar with the system, the operators were Impressed 
with its potential and enthusiastic about It even though they felt that the 
riHugnition accuracy should be improved. In general, there was an agreement 
among the operators that at least 93% average total recognition accuracy Is 
needed with a 30-word vocabulary in order for the operators to f* el comfort- 
able with the voice comirand system during real-time operation. In the total 
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rocoguition accuracy the rejected words are counted as errors; see last column 
In Table 2. 

A few interesting general remarks emerged after the tests: 

1) Command words should be added to the vocabular that will (!) restore 
(‘amera and monitor to the condition prior to a rjcognltion error » and 
(ii) allow an operator to name a selected camera position once it has 
been set up so that it may be re-invoked with a single word instead 
of repeating a complete command sequence, 

2) Though the commands voiced by an operator did not require verifica- 
tion before execution, the operators often felt it reassuring to look 
at the self-scan display of the recognized words, 'this display, how- 
ever, should be a small device and placed very close to the TV 

mon 1 tors. 

3) The operators would like to be able to issue commands other than 
**stop” or '^reverse** while a camera is moving. This capability would 
speed up the operation. 

Though the control tests were not meant to test and evaluate a particular 
voice recognition system. It should still be mentioned that the VOETS^) system 
performed very well even in the presence of acoustic and electrical noise. 

'(’he main conclusions of the test are: (i) the application concept offers 

potential for enhancement of Shuttle operations; (il) additional development 
is needed to achieve operational accuracy and reliability over a broad user 
population; (ili) the use of computer-recogn i zed voice commands can contribute 
to a better man-machine system interaction; (Iv) human acoustic character- 
istic's and training have a major Impac t on system performance. As a conc'lu- 
sion it was decided to conduct further application costs and to pronu^te the 
cievc* 1 opmeut ot a prototype flight voice command system for future Space Shut- 
t I e appl icot ions. 

Ac' ’ rdj^men t 


riu* rescMfc'h describi^d in this paper was carried out at the Jet Prcjpul- 
sion (-abi)r.i t cir y , (\illfornia Institute c»f Tt‘c hno 1 c^gy , under NASA Contract 
NAS7-100. I’he programmable interface' for connecting VDKTS to the video svs- 
tc‘m control c ircuits was developed by H. C. I’rimus of JPh. 

Note: A seven-mi Cult c‘ narrated movie is available whic'h shows the control 

tc-sts at the JS(! MOK using vcmcc* control of the Space Shut tit* video system. 


is c.irried bv Interstate K Ua t r*an ic* s , Anaheim, CA, 
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Table K 


est Summary. 


• 4TRAINEDOPFPATORS 

• 48 TEST RUNS (EACH 15 TO 20 MINUTES) 


• 32 RUNS Wini VOICE COMMANDS 


OPERATOR 

AVERAGE NUMBER OF 
COMMANDS 

MANUAL MODE 

VOICE MODE 

A 

57 

62 

B 

80 

70 

C 

4f 

63 

D 

__ _.66 

. 


Table 2. Voice Command Recognition 
Summary Accuracy. 


OPERATOR 

VOCABULARY 

ACCURACY W/ 01%) 

ACCURACY W 1%) 

A 

JSCN04 

% 


90 

B 

(Fis, 7) 

97 


95 

C 


89 


86 

0 


80 


78 

AVERAGES 


91 


87 

A 

JSCC02 

92 


89 

6 

Uq. !) 

97 


97 

C 


86 


83 

0 


TO 


72 

AVERAGES 


89 


85 


Table 3. 

Average Task Durations. 



OPERATOR 

A 

n 

c 

D 


MANUAL 


VOICE 


14:i6 (M 1 nut es : Sec'onds ) 21:12 


2U:56 

20:19 

n .25 

14:74 

>S: 12 

25: W 
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Figtire 1. Space Shuttle TV Camera and Monitor Control Keyboard. 



Figure 2. Space Shuttle Coclol" Control and Information Environment 
for Manipulator 'v^eiiition. 
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Figure 3. Main Elements of the Voice Control System and 
Overall System Implementation. 



Figure 4. Operator Uses Voice Control of Video System Dvtrlng 
Manual Control of Shuttle Manipulator. 
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5. Natural Vocabultiry with Staple Keyboard Syntax. 
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Figure 8. Reduced Vocabulary with Concatenated Words and without Syntax. 



Figure 9. Task Scene for Voice Control of the Shuttle Video Syetea 
During Nenuel Control of the &.mttle Nenlguletor. 
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